Refining skin lesions classification performance using geometric features of superpixels

This paper introduces superpixels to enhance the detection of skin lesions and to discriminate between melanoma and nevi without false negatives, in dermoscopy images. An improved Simple Linear Iterative Clustering (iSLIC) superpixels algorithm for image segmentation in digital image processing is proposed. The local graph cut method to identify the region of interest (i.e., either the nevi or melanoma lesions) has been adopted. The iSLIC algorithm is then exploited to segment sSPs. iSLIC discards all the SPs belonging to image background based on assigned labels and preserves the segmented skin lesions. A shape and geometric feature extraction task is performed for each segmented SP. The extracted features are fed into six machine learning algorithms such as: random forest, support vector machines, AdaBoost, k-nearest neighbor, decision trees (DT), Gaussian Naïve Bayes and three neural networks. These include Pattern recognition neural network, Feed forward neural network, and 1D Convolutional Neural Network for classification. The method is evaluated on the 7-Point MED-NODE and PAD-UFES-20 datasets and the results have been compared to the state-of-art findings. Extensive experiments show that the proposed method outperforms the compared existing methods in terms of accuracy.

www.nature.com/scientificreports/ melanoma. The features analyzed included circularity and irregularity. Hybrid features covering geometrical shape, color and texture features were introduced by Mukherjee et al. 36 to improve the classification accuracy in malignant melanoma detection. The extracted features are classified using the classical SVM, KNN and Ensemble Boosted Tree classifiers. In the paper 11 , a large number of shape features, such as asymmetry, diameter, border and color were used to generate the total dermoscopic value (TDS). The classification is done by using the TDS score. In summary, both ML algorithms and DL techniques combine both local and global features for an improved and accurate classification task. They ask for more training data or for pre-trained models but face a main limitation related to the computation complexity. The SPs approach proposed in our paper is neither local, nor does it use the global representation, but has an important contribution to the dimensionality reduction for less computation time and improved accuracy. The hypothesis that a robust and consistent SP segmentation, a performant SP selection and quantitative SP shape feature extraction can enhance the accuracy of melanoma detection and classification is employed. Also, the SPs are generated using an unsupervised method, they keep low-level details and downscale the image -with the direct consequence being the low computational cost.
This paper utilizes the SPs segmentation method for melanoma detection and classification proposed in the state of the art along with some original processing that brings some improvements in classification. A wellestablished local graph cut algorithm is adopted to identify the region of interests (ROIs) in dermoscopy images. An improved Simple Linear Iterative Clustering (iSLIC) algorithm is exploited to segment SPs. Specifically, iSLIC selects the SPs of interest in the segmentation map through the instrumentality of labelling and discards those SPs belonging to the background. Then, the obtained map is used for the SPs' feature extraction to utilize the spatial context of skin lesions as additional information to improve the melanoma classification. Using SPs instead of individual pixels improves the computational efficiency and the performance of classification, as the SPs contain pixels that are largely identical in features and belong to the same class. Moreover, this approach is hybrid by nature and improves the image description through the features instrumentality. We extracted six types of features, for each SP. All the features were normalized. Then, six ML tools such as, RF, AD, DT, GNB, KNN, and SVM [14][15][16][17][18][19][20][21] . and three NNs such as, PRNN, FNN and 1D CNN were trained using the selected features to classify SPs for melanoma detection. The performance of the proposed approach is measured quantitatively based on the following metrics: accuracy, recall, F1-score, precision and Matthew's correlation coefficient (MCC).
Our contributions compared to other state-of-the-art approaches are summarized as: 1. An improved Simple Linear Iterative Clustering (iSLIC) algorithm is proposed for melanoma and nevi classification in demoscopy images following two new visions: a. A new algorithm which assigns labels to SPs, this allowing the background SPs to be eliminated; b. A shape SPs analysis is proposed to generate the input vectors for classification purposes. Thus, the geometric features (perimeter, area, eccentricity, orientation, convex area, and major axis length) of the SPs were computed and fed to nine classifiers.
2. The validation framework covers tree datasets. 3. To the best of our knowledge, we are the first to apply the segmented SPs for shape and geometric feature extraction and classification task.

Proposed framework
In this section, we describe in detail our approach for skin lesion classification. A brief conceptual block diagram of our pipelines and the main stages of our research are illustrated in Fig. 1. In the first stage, the images dataset is built using images provided by the 7-Point, MED-NOD and PAD-UFES-20 databases. During the preprocessing stage, artefacts such as hair and noise are removed as they could adversely affect the segmentation performance. The segmentation of pre-processed images is performed using the graph cut local method. Because of this step, the possible hidden confounder and so-called 'shortcut learning' are avoided and we can focus only on analyzing the regions of interest from now on. For SPs generation, we employed the iSLIC algorithm on the identified regions of interests. This will minimize the complexity of the further processing steps. The segmented SPs generated from nevi and melanoma lesion images allow the computation of relevant shape and geometric features. The normalized features data is then fed into the MLs and NNs tools for classification.

Results and discussion
In this section, six ML classifiers and three NN models were used to test the performance of skin lesion classification based on the features of SPs. The iSLIC segmentation method explores the spatial correlation of pixels, exposes the low-dimensional structure of pixels and finally, improves the classification results. By using different SP segmentation numbers there is found out that n = 100 provides the high accuracy of classification at a relatively low dimension. Three datasets, named 7-Point, MED-NODE and PAD-UFES-20, are used for experimental investigation.
The SP approach is focused on the data distribution to avoid insufficient and low-quality data. Overall, the experiment indicates that the SPs segmentation approach is useful for better classification accuracy. The reason is mainly because from a small number of dermoscopic images, a large number of SPs can be generated are further used as training samples. These SPs carry both low-level details and high-level information as the pixels inside the SP do not need to be labeled. First, we have analyzed the impact of different sizes of training and testing sets. So, the SP datasets were split into training and test sets using the ratios 0.7:0. 3 www.nature.com/scientificreports/ used to prevent overfitting and to increase the accuracy of classification in the training set. The accuracy is provided as the mean of the accuracies of the fivefold models. For each classifier and dataset, the details of the confusion matrix and performance metrics are shown in Figs. 2 and 3. Figure 2 displays the best accuracy, precision, sensitivity, F1-score and MCC scores obtained for the RF, AD and DT algorithms. From Fig. 3, it is noted that the 1D CNN model outperforms in terms of accuracy, precision, sensitivity, F1-score and MCC scores the PRNN and FNN models. Figure 2 presented the proposed classification results on the utilized test datasets (i.e., 30% of the samples in SPs dataset). The best classification accuracy was achieved by RF, AD, and DT classifiers, as the accuracy is higher than 0.95, recall, F1-score and precision achieved values higher that 0.94 and MCC values are better than 0.89, respectively. A poor performance of classification has been provided by GNB, KNN and SVM algorithms with the accuracy < 0.86, recall < 0.83, f1-score of 0.71, precision of 0.85 and MCC lower than 0.72. In Fig. 3, the performance of classification on the test datasets is provided for each NN model. It is noted accuracy (0.984), recall (0.984), F1-score (0.984), precision (0.984) and MCC (0.968) are achieved by the 1D CNN classifier for 7-Point dataset while the best performance is obtained for MED-NODE and PAD-UFES-20 datasets. It is important to highlight that the higher performance metric values were measured on the test set and they are not the result of overfitting. The declared goal of our research to discriminate between melanoma and nevi without false negatives was achieved through the use of 1D CNN. This model classifies all the positive samples as positive, and does not misclassify a negative sample as positive in the case of MED-NODE and PAD-UFES-20 datasets.
The achieved accuracy of the proposed method is sufficient for a fair comparison with other published techniques. In the cases of RF, DT and AD algorithms improved results are reported, while the SVM classifier provided good results that are in line with other reported results. In our experiment, the SVM classifier is less performant. Table 1 presented the achieved performance of a few relevant methods. Some authors used only one ML tool while other authors used combined tools. The researchers used classifiers such as SVM, KNN, DT, and RF for melanoma diagnosis with accuracies ranging from 48.39% to 92.1%, respectively. Generally, the SVM classifier gives the best accuracy.
The classification of skin lesion images is a very challenging task, mostly due to the low contrast which makes it difficult to differentiate the border of the lesion or due to the need of extensive training, small interclass variation, multi-shape images, and imbalanced classes of datasets. The last issue is a critical one and usually, the performance of the SVM and KNN classifiers is strongly influenced. These algorithms ask for very large data sets 47 . In spite of the fact that SPs dramatically increase the number of input samples in our experiment, the performance of classification for SVM, KNN and GNB is lower than before but generally still satisfactory.
Also, various DL models are employed by researchers to acquire good performances. There is a noticeable trend in the number of studies using DL for melanoma detection. When a large number of images is provided, DL yields superior performance. In this case, the utilization of DL techniques asks for precaution as the datasets www.nature.com/scientificreports/ often have insufficient samples to allow sound learning of characteristics that vary significantly. The issues of less training data, overfitting and/or underfitting of the model cause misclassification and reduce the effectiveness of the approaches. However, NN models are more appropriate to solve the classification problem and they can even outperform other traditional techniques. When the problem of insufficient training data is overcome, the accuracy of classification is a little bit higher than machine learning approach. It is worth mentioning that, in the proposed work, the number of generated SPs allow for balanced classes of datasets and lead to a strong generalization. The results show that the 1D CNN has obtained an accuracy of 100% for MED-NODE and PAD-UFES-20 datasets and 98,48% for 7-Point dataset, which is better than the utilized machine learning algorithms.
The proposed method has some limitations. The SLIC superpixels segmentation algorithm is a subject to boundaries preservation. It is well-known that the boundary is subject to the performance of the SP segmentation algorithm. This issue has been overcome by proposing the improved SPs segmentation algorithm, namely iSLIC. The results are very good as the dermoscopic images suffer itself from low contrast which makes it difficult to differentiate the border of the lesions. Another limitation consists of the heuristic approach through which the CNN architectures were developed. The CNN was trained and progressively refined by changing the hyperparameters until this was proven experimentally to improve the detection rates. A pre-trained CNN model didn't find applicability in this study because it is capable of discriminating images of different objects classes, but may be less effective in discerning the difference between different textures in the same object. In future works, the knowledge acquired in this stage should be used from the beginning in the handcrafted deep convolutional networks architecture design.

Conclusion
This study proposes to classify skin lesion image by using the SP features. The goal was to find the best intelligent artificial tools in terms of performances capable to differentiate skin lesions. Instead of extracted features from images we proposed a SPs approach for feature extraction. The SPs were generated using an improved Simple Linear Iterative Clustering (iSLIC) and shape and geometric features were gathered from each segmented SP. Six Machine Learning algorithms and three Neural Networks models were used for classification purposes and the best model for skin lesion classification was determined to be the 1D CNN. This model is a better alternative to ML techniques by providing better results overall. The SPs approach strongly increases the number of input data for classification so that more SPs, the higher the classification accuracy is obtained. Finally, comparative experiments demonstrate that our proposed method outperforms the compared methods in terms of accuracy. The proposed method explores a more practical approach as it is able to pick useful features from multiple regions containing groups of pixels that look similar but belonging to the same skin lesion image. In this way, In future works, we intend to test the proposed method on hybrid features based on shape, color and texture and also to use more image databases.

Methods
Databases. Three image databases, 7-Point MED-NODE and PAD-UFES-20, are evaluated in the experiments. Class distribution statistics of datasets are provided in Table 2.
The Algorithm 1 (n = 100) runs for each dataset and augments the number of processed objects, as follows: • MED-NODE: 170 images generate 13,073 SPs; performs a robust and consistent SP segmentation, a performant SP selection and quantitative SP shape feature extraction. The proposed iSLIC selects the SPs of interest in the segmentation map and further, this map is used for the SPs' feature extraction. iSLIC allows to assess multiple regions containing groups of pixels that look similar, from the same image, for diagnosing melanoma. In order to test algorithm generalizability, various classifiers are used. Algorithm 1 generates SPs from the image datasets and allows the shape and geometrical features extraction. The normalized features feed the MLs tools (RF, AD, DT, GNB, KNN and SVM), and neural networks (PRNN, FNN and 1D CNN) in order to output the category for each SP. The superpixels_procedure() contains the iSLIC algorithm and properties_procedure() includes the tools for perimeter, area, eccentricity, orientation, convex area and major axis length computation. These features are denoted as Ai, Pi, Ei, Oi, ENi, CVi. The variable n is the number of SPs intended to be generated (n = 100), Li is a label matrix of type double and Ni is the number of SPs that is computed. To discard the SPs that belong to the background, a black mask is generated. In the structure if…then, the conditionality as only that labelled SPs are kept is imposed. These SPs belong to the mask. The rest of the SPs are discarded.  Splitting dataset and data normalization. The SPs data generated from the 7-Point, MED-NODE and PAD-UFES-20 databases is split into training and test sets using the ratios, 0.7:0.3, 0.8:0.2, 0.85:0.15 (training:testing). These ratios allow the models to learn and adapt to various scenarios and to overcome the overfitting phenomena. To avoid data leakage, the normalization process is applied on the partitioned data into both the training and test sets, by using the z-score normalization 40 , where µ and σ are the mean and standard deviation values, respectively, of the vector A. a is each record from vector A to be normalized. a' is the result of normalization.
Geometric and shape features. In this paper, the goal is to segment, based on SP computation, the skin lesion from pre-processed dermoscopic images. The segmented SPs are useful for relevant feature extraction and they keep the low-level details and downscale the image-a direct consequence being the low computation cost. Also, the SPs contain low-level information and are the perfect way to reduce the loss of details. Figure 4 illustrates the SPs segmentation results for two sampled images belonging to the 7-Point database. The first row denoted (a) shows the SPs segmentation results for nevi; the second row (b) for melanoma; rows (c) and (d) display the results of segmentation where all background SPs are removed. iSLIC generates an over-segmented image with more or less n pieces. Third to last column (i.e., a3 to a6 and b3 to b6) present the SPs segmented images for several n values (n = 50, 100, 150, 200). The impact of different SP segmentation numbers to the computational cost of classification has been evaluated. The performance of classification is not greatly affected by the number of generated SPs but the best classification results are obtained for n = 100.
The features computed in our framework include the perimeter, area, eccentricity, orientation, convex area and major axis length 37 . The area and perimeter are the most relevant shape features [10][11][12] . Table 3 exemplifies the features extracted from a SP generated into the skin lesion.

Machine learning classifiers.
The classification is the most important step in skin cancer recognition as it distinguishes among the skin lesions and ML algorithms perform very well in this area. However, it is difficult to offer a fair and accurate comparison among different ML methods as they used various hyperparameters, different variants and several datasets. The most popular ML techniques are AB, SVM and DT [14][15][16][17][18][19][20] , but KNN, RF, GNB classifiers are also mentioned regularly. This study focuses, in the first part, on the classification provided by the well-known ML classifiers-RF, AD, DT, GNB, KNN, and SVM. www.nature.com/scientificreports/ • Random forest (RF) is a technique based on decision tree algorithms and generalizes the classification process using two types of randomizations: at the tree level and at the node level. The first ensures that each tree is fed by a bootstrap, while the second refer to a subset of feature dimensions which is randomly selected from the original dimension 14 . • Support vector machine (SVM) is a linear classifier that can separate two classes by finding the maximum margin separating hyperplane between the two classes 10,15 . To return the best accuracy of classification, choosing the right kernel is essential 16,17 . • AdaBoost (AD) is a meta-estimator in ML being used as an ensemble method. Usually, decision trees with one level are used with AD. AD is employed for tackling binary classification problems so it needs assign to each registration some weights. Initially, the weights are equal. For each feature, a decision stump is generated and the Gini Index of each tree is calculated. The tree having the lowest Gini Index creates the first stump 15 . The iterative process stops when a low training error is achieved. • K-nearest neighbor (KNN) is a non-parametric supervised learning classifier. It classifies data based on their proximate neighbors. For a correct classification, the input parameters are K, the number of nearest neighbors and d, the distance between neighbors. The Euclidean distance, Hamming distance, Manhattan distance, and Minkowski distance are the usual distances used 18 . • Decision tree (DT) is a non-parametric supervised learning method that uses hierarchical trees for classification 18,19 . DT repeatedly splits the data set according to a criterion that maximizes the separation of the data 20 . • Gaussian Naïve Bayes (GNB) is a supervised learning classifier based on Bayes theorem for probabilistic classification. NB uses the maximum likelihood method to estimate the particular values for mean and standard deviation. The GNB starts with the "naïve" assumption of conditional independence between every sample of features obtained of the melanoma and nevi 20 . This assumption is not always true but using the generative learning mechanism, the models is able to predict the posterior probability.
The hyperparameters in ML classifiers that control the learning process are shown in Table 4.   26,[28][29][30]32,41,42 . The classification was performed using the well-established and standardized databases where the images are stored in different modalities. During the classification process employing deep learning methods, the main drawbacks to be overcome are related to having access to little training data, overfitting, and underfitting of the models. Also, some challenges occur when complex and/or Table 3. Shape and geometric features.

Area
The area is the number of pixels in a region. It is marked in black

Perimeter
The perimeter is distance between each adjoining pair of pixels around the border of the SP. It is marked in black

Orientation
The orientation is the angle between the x-axis and the major axis of the ellipse fitting the SP. Its values range from − 90° to 90° Major axis length The length of the major axis of the fit-ellipse Eccentricity It is the ratio of the distance between the foci of the fit-ellipse and its major axis length. The value is between 0 and 1

Convex area
It is the number of pixels in a convex image Table 4. Hyperparameters in proposed machine learning models. Where the hyperparameters are not explicitly defined, they are considerate as default. www.nature.com/scientificreports/ rare features are considered. A wide range of skin features is available to researchers. The feature extraction and selection reduce the size of the input vector for AI models and also improve the computation load. The present study uses, in the second part, three neural networks to cope with the SP features extraction.

RF
• Pattern Recognition Neural Networks (PRNNs) are feedforward networks that makes decisions from complex patterns of information. In the hidden layer, the pattern that is not linearly separable is transformed into higher-dimensional space that is more linearly separable. • Feedforward Neural Networks (FNNs) are neural networks with a simple architecture where the input is processed in only one direction. The node connections do not form a cycle. • 1-Dimensional Convolution Neural Networks (1D-CNNs) are used in the case of certain applications where they are advantageous and thus preferable to their 2D models as they have the computational complexity significantly lower, their compact configurations are easier to train and implement and are relatively quick to train 43,44 . It has an increased capability to find a function of fixed complexity that approximates the nonlinear relationships between variables.
The hyperparameter values used in the NN models are highlighted in Table 5. The selection of NN hyperparameters was performed around the values suggested by the KerasTuner, a general-purpose hyperparameter tuning library. The Adam optimization algorithm for training NNs is simple to use and computationally efficient. Binary cross-entropy calculates the loss value. The epoch count is 100 for low loss and no overfitting. The rate of learning is initially set to 0.01.
The configurations of the used NN architectures are shown in Fig. 5.
Accuracy represents the number of total samples correctly predicted reported to the total number of predictions 22-25 , Precision is the ratio of correctly predicted positives reported to the total number of positive cases.
The recall calculates the ratio of predicted positive samples to the true positive plus the false negative cases. It neglects how the negative samples are classified.
The F1-Score calculates the weighted average of both the Precision and Recall and can maximize either of them. A maximum F1 value indicates that the classification model has an optimal balance of recall and precision.
For imbalanced datasets, the accuracy and F1-score measures can provide overoptimistic, inflated results. The Matthews correlation coefficient (MCC) is used to overcome this issue. It produces a high score if the classifier correctly predicts most of the positive and negative data samples 45 . Particularly, MCC provides a correct prediction when there are many TP samples but few TNs (or vice-versa). In this case, F1-score and accuracy can provide spurious information.
(2) Accuracy = TP + TN TP + TN + FN + FP   www.nature.com/scientificreports/ samples that are predicted to be positive and FN (False negative) represents the number of positive samples that are predicted to be negative.